Relemed: sentence-level search engine with relevance score for the MEDLINE database of biomedical articles

نویسندگان

  • Mir S. Siadaty
  • Jianfen Shu
  • William A. Knaus
چکیده

BACKGROUND Receiving extraneous articles in response to a query submitted to MEDLINE/PubMed is common. When submitting a multi-word query (which is the majority of queries submitted), the presence of all query words within each article may be a necessary condition for retrieving relevant articles, but not sufficient. Ideally a relationship between the query words in the article is also required. We propose that if two words occur within an article, the probability that a relation between them is explained is higher when the words occur within adjacent sentences versus remote sentences. Therefore, sentence-level concurrence can be used as a surrogate for existence of the relationship between the words. In order to avoid the irrelevant articles, one solution would be to increase the search specificity. Another solution is to estimate a relevance score to sort the retrieved articles. However among the >30 retrieval services available for MEDLINE, only a few estimate a relevance score, and none detects and incorporates the relation between the query words as part of the relevance score. RESULTS We have developed "Relemed", a search engine for MEDLINE. Relemed increases specificity and precision of retrieval by searching for query words within sentences rather than the whole article. It uses sentence-level concurrence as a statistical surrogate for the existence of relationship between the words. It also estimates a relevance score and sorts the results on this basis, thus shifting irrelevant articles lower down the list. In two case studies, we demonstrate that the most relevant articles appear at the top of the Relemed results, while this is not necessarily the case with a PubMed search. We have also shown that a Relemed search includes not only all the articles retrieved by PubMed, but potentially additional relevant articles, due to the extended 'automatic term mapping' and text-word searching features implemented in Relemed. CONCLUSION By using sentence-level matching, Relemed can deliver higher specificity, thus eliminating more false-positive articles. By introducing an appropriate relevance metric, the most relevant articles on which the user wishes to focus are listed first. Relemed also shrinks the displayed text, and hence the time spent scanning the articles.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Advertising Keyword Suggestion Using Relevance-Based Language Models from Wikipedia Rich Articles

When emerging technologies such as Search Engine Marketing (SEM) face tasks that require human level intelligence, it is inevitable to use the knowledge repositories to endow the machine with the breadth of knowledge available to humans. Keyword suggestion for search engine advertising is an important problem for sponsored search and SEM that requires a goldmine repository of knowledge. A recen...

متن کامل

PolySearch2: a significantly improved text-mining system for discovering associations between human diseases, genes, drugs, metabolites, toxins and more

PolySearch2 (http://polysearch.ca) is an online text-mining system for identifying relationships between biomedical entities such as human diseases, genes, SNPs, proteins, drugs, metabolites, toxins, metabolic pathways, organs, tissues, subcellular organelles, positive health effects, negative health effects, drug actions, Gene Ontology terms, MeSH terms, ICD-10 medical codes, biological taxono...

متن کامل

Learning SVM Ranking Functions from User Feedback Using Document Metadata and Active Learning in the Biomedical Domain

Information overload is a well-known problem facing biomedical professionals. MEDLINE, the biomedical bibliographic database, adds hundreds of articles daily to the millions already in its collection. This overload is exacerbated by the lack of relevance-based ranking for search results, as well as disparate levels of search skill and domain experience of professionals using systems designed to...

متن کامل

Learning SVM Ranking Function from User Feedback Using Document Metadata and Active Learning in the Biomedical Domain

Information overload is a well-known problem facing biomedical professionals. MEDLINE, the biomedical bibliographic database, adds hundreds of articles daily to the millions already in its collection. This overload is exacerbated by the lack of relevance-based ranking for search results, as well as disparate levels of search skill and domain experience of professionals using systems designed to...

متن کامل

Impact of Controlled and Free Language Use in Retrieving Articles from the ProQuest and Science Direct Databases

Abstract Introduction: The growth and expansion of the Internet has changed the way information is accessed and many facilities have been created on the Web to facilitate and expedite information locating. Objective: To identify the impact of keyword documentation using the medical thesaurus on the retrieval of articles from Proquest and Science Direct databases. Materials and Methods:The pr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • BMC Medical Informatics and Decision Making

دوره 7  شماره 

صفحات  -

تاریخ انتشار 2007